Average Size of a Suffix Tree for Markov Sources

نویسندگان

  • Philippe Jacquet
  • Wojciech Szpankowski
چکیده

We study a suffix tree built from a sequence generated by a Markovian source. Such sources are more realistic probabilistic models for text generation, data compression, molecular applications, and so forth. We prove that the average size of such a suffix tree is asymptotically equivalent to the average size of a trie built over n independent sequences from the same Markovian source. This equivalence is only known for memoryless sources. We then derive a formula for the size of a trie under Markovian model to complete the analysis for suffix trees. We accomplish our goal by applying some novel techniques of analytic combinatorics on words also known as analytic pattern matching.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of the average depth in a suffix tree under a Markov model

In this report, we prove that under a Markovian model of order one, the average depth of suffix trees of index n is asymptotically similar to the average depth of tries (a.k.a. digital trees) built on n independent strings. This leads to an asymptotic behavior of (log n)/h + C for the average of the depth of the suffix tree, where h is the entropy of the Markov model and C is constant. Our proo...

متن کامل

Suffix Trees and Simple Sources

Using an intricate method, Jacquet and Szpankowski [2] compared the depth of insertion into suffix-trees and tries in the non-uniform Bernoulli model, as well as the average size of suffix-trees and tries under the same model. They proved that the depth of insertion has asymptotically the same probabilistic behaviour in both cases, and that the average sizes of a trie and a suffix-tree built wi...

متن کامل

Compact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth

Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...

متن کامل

Uncommon Suffix Tries

Common assumptions on the source producing the words inserted in a suffix trie with n leaves lead to a lnn height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of n and another one whose saturation level is negligible with respect to lnn. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources and are easily extende...

متن کامل

Class-basedvariable Memory L

In this paper, we present a class-based variable memory length Markov model and its learning algorithm. This is an extension of a variable memory length Markov model. Our model is based on a class-based probabilistic suffix tree, whose nodes have an automatically acquired wordclass relation. We experimentally compared our new model with a word-based bi-gram model, a word-based tri-gram model, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1605.02123  شماره 

صفحات  -

تاریخ انتشار 2016